A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew
نویسندگان
چکیده
This paper presents a comprehensive NLP system by Melingo that has been recently developed for Arabic, based on Morfix an operational formerly developed highly successful comprehensive Hebrew NLP system. The system discussed includes modules for morphological analysis, context sensitive lemmatization, vocalization, text-to-phoneme conversion, and syntactic-analysis-based prosody (intonation) model. It is employed in applications such as full text search, information retrieval, text categorization, textual data mining, online contextual dictionaries, filtering, and text-to-speech applications in the fields of telephony and accessibility and could serve as a handy accessory for non-fluent Arabic or Hebrew speakers. Modern Hebrew and Modern Standard Arabic share some unique Semitic linguistic characteristics. Yet up to now, the two languages have been handled separately in Natural Language Processing circles, both on the academic and on the applicative levels. This paper reviews the major similarities and the minor dissimilarities between Modern Hebrew and Modern Standard Arabic from the NLP standpoint, and emphasizes the benefit of developing and maintaining a unified system for both languages.
منابع مشابه
Smoothing a Lexicon-based POS Tagger for Arabic and Hebrew
We propose an enhanced Part-of-Speech (POS) tagger of Semitic languages that treats Modern Standard Arabic (henceforth Arabic) and Modern Hebrew (henceforth Hebrew) using the same probabilistic model and architectural setting. We start out by porting an existing Hidden Markov Model POS tagger for Hebrew to Arabic by exchanging a morphological analyzer for Hebrew with Buckwalter's (2002) morphol...
متن کاملروشی جدید جهت استخراج موجودیتهای اسمی در عربی کلاسیک
In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...
متن کاملGrapheme to phoneme conversion: an Arabic dialect case
We aim to develop a Speech-to-Speech translation system between Modern Standard Arabic and Algiers dialect. Such a system must include a Text-to-Speech module which itself must include a Grapheme-to-Phoneme converter. Algiers dialect is an Arabic dialect concerned by the most problems of Modern Standard Arabic in NLP area. Furthermore, it could be considered as an under-resourced language becau...
متن کاملMachine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions
Modern Hebrew and Modern Standard Arabic, both Semitic languages, share many orthographic, lexical, morphological, syntactic and semantic similarities, but they are still not mutually comprehensible. Most native Hebrew speakers in Israel do not speak Arabic, and the vast majority of Arabs (outside Israel) do not speak Hebrew. Machine translation (MT) between these two language has the potential...
متن کاملIs Modern Hebrew Standard Average European? The View from European
In contrast with previous work emphasizing European influences on Modern Hebrew as compared to the Biblical Hebrew model adopted by the Hebrew revival movement, this article sets out to examine relevant typological features of Modern Hebrew in its own right. Taking the typological literature on Standard Average European as a starting point, it is argued that Modern Hebrew is in fact quite far f...
متن کامل